MLP-Aware Dynamic Instruction Window Resizing in Superscalar Processors for Adaptively Exploiting Available Parallelism
نویسندگان
چکیده
Single-thread performance has not improved much over the past few years, despite an ever increasing transistor budget. One of the reasons for this is that there is a speed gap between the processor and main memory, known as the memory wall. A promising method to overcome this memory wall is aggressive out-of-order execution by extensively enlarging the instruction window resources to exploit memory-level parallelism (MLP). However, simply enlarging the window resources lengthens the clock cycle time. Although pipelining the resources solves this problem, it in turn prevents instruction-level parallelism (ILP) from being exploited because issuing instructions requires multiple clock cycles. This paper proposed a dynamic scheme that adaptively resizes the instruction window based on the predicted available parallelism, either ILP or MLP. Specifically, if the scheme predicts that MLP is available during execution, the instruction window is enlarged and the window resources are pipelined, thereby exploitingMLP. Conversely, if the scheme predicts that less MLP is available, that is, ILP is exploitable for improved performance, the instruction window is shrunk and the window resources are de-pipelined, thereby exploiting ILP. Our evaluation results using the SPEC2006 benchmark programs show that the proposed scheme achieves nearly the best performance possible with fixed-size resources. On average, our scheme realizes a performance improvement of 21% over that of a conventional processor, with additional cost of only 6% of the area of the conventional processor core or 3% of that of the entire processor chip. The evaluation results also show 8% better energy efficiency in terms of 1/EDP (energy-delay product). key words: microprocessor, superscalar processor, memory-level parallelism, instruction-level parallelism
منابع مشابه
Very-Wide-Issue Superscalar Microengine Configurations
To continue microprocessor performance improvements made in the last 2 decades, instruction-level parallelism must be exploited across multiple basic block boundaries. This necessity has led to execution engines which dynamically predict a stream of instructions which are executed concurrently. As issue widths increase, former assumptions about requirements for execution resources such as inter...
متن کاملControl and Data Dependence in Multithreaded Processors
Boosting instruction level parallelism in dynamically scheduled processors requires a large instruction window. The approach taken by current superscalar processors to build the instruction window is known to have important limitations, such as the requirement of more powerful instruction fetch mechanisms and the increasing complexity and delay of the issue logic. In this paper we present a nov...
متن کاملAdaptively Speculative Execution for Wide-Issue Superscalar Processors
Abstract In the past, a scheme of Adaptive Branch Trees (ABT) has been proposed for adaptively keeping track of alternative branch paths and to speculatively execute the code on the most likely path with constrained hardware resources. In this paper, we combine the ABT concept with the instruction prefetch by realizing an ABT table to prefetch the most likely path of execution stream codes so a...
متن کاملApplication of instruction analysis/scheduling techniques to resource allocation of superscalar processors
This paper presents the development of instruction analysis/scheduling CAD techniques to measure the distribution of functional unit usage and the micro operation level parallelism (MLP), which together determine the proper functional unit allocation for superscalar microprocessors, such as the x86 microprocessors. The proposed techniques fit in the early design exploration phase in which the t...
متن کاملThe Microarchitecture of Superscalar Processors
Superscalar processing is the latest in a long series of innovations aimed at producing ever-faster microprocessors. By exploiting instruction-level parallelism, superscalar processors are capable of executing more than one instruction in a clock cycle. This paper discusses the microarchitecture of superscalar processors. We begin with a discussion of the general problem solved by superscalar p...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید
ثبت ناماگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید
ورودعنوان ژورنال:
- IEICE Transactions
دوره 97-D شماره
صفحات -
تاریخ انتشار 2014